TREC 7 Ad Hoc, Speech, and Interactive tracks at MDS/CSIRO

نویسندگان

  • Michael Fuller
  • Marcin Kaszkiel
  • Dongki Kim
  • Corinna Ng
  • John Robertson
  • Ross Wilkinson
  • Mingfang Wu
  • Justin Zobel
چکیده

1 Overview For the 1998 round of TREC, the MDS group, long-term participants at the conference, jointly participated with newcomers CSIRO. Together we completed runs in three tracks: ad-hoc, interactive, and speech. 2 Ad-hoc task In TREC-5 we used document retrieval based on arbitrary passages 8, 9], or xed-length passages that could start at any word position. Although far from the best runs in TREC-5, these results were promising, in particular for long documents. In TREC-6 we continued with arbitrary passages, but our main emphasis was on comprehensive factor analysis of successful automatic query expansion and reenements methods in the context of the vector space model 5]. This year we have reened the MG retrieval system to include Rocchio-based relevance feedback. Also, phrase matching has been added. We have continued to use arbitrary passages and combination of evidence for document retrieval. An in-house version of the MG retrieval system has been used for all experiments. All experiments were carried out on an Intel Pentium II (300 Mhz) with a single processor and 256 Mb of physical memory. Queries and documents were matched using the Okapi formulation 13]: sim(q; d) = X t2q^d w d;t wq;t (1) with w d;t : (k1 + 1) f d;t k1 (1 ? b) + b W d avr W d ] + f d;t and wq;t: (k3 + 1) fq;t k3 + fq;t log N ? ft + 0:5 where k1, k3, and b are constants set to 1:2, 1000, and 0:75 respectively , as recommended by the City University group 13]. The value W d is the length of document d in bytes and avr W d is the average document length in the entire collection. The value N is the total number of documents in the collection, ft is the number of documents in which term t occurs, and fx;t is the frequency of term t in either document d or query q. Okapi is not easily adaptable to arbitrary passage ranking because parameters kx and b are tuned to document ranking. Queries and passages are matched using a non-normalised version of the cosine similarity function: sim(q; p) = X t2q^p (wq;t wp;t) (2) with weights that have been shown to be robust and give good retrieval performance 1]: wq;t = (log(fq;t) + 1) log(N ft + 1) and wp;t = log(fp;t) + 1: Automatic relevance feedback was based on the Rocchio …

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Trec 7 Ad Hoc, Speech, and Interactive Tracks at Mdsscsiro 2.1 System Description

In TREC-5 we used document retrieval based on arbitrary passages [8, 9], or xed-length passages that could start at any word position. Although far from the best runs in TREC5, these results were promising, in particular for long documents. In TREC-6 we continued with arbitrary passages, but our main emphasis was on comprehensive factor analysis of successful automatic query expansion and re ne...

متن کامل

The RMIT/CSIRO Ad Hoc, Q&A, Web, Interactive, and Speech Experiments at TREC 8

The constants k1, k3 and b were set to 1.2, 1000 and 0.75 respectively, as recommended by the City University group [13]. Wd is the length of the document d in bytes and avr Wd is the average document length in the entire collection. N is the total number of documents in the collection, ft is the number of documents in which term t occurs, and fx;t is the frequency of term t in either a documen...

متن کامل

AT&T at TREC-7

This year AT&T participated in the ad-hoc task and the Filtering, SDR, and VLC tracks. Most of our eeort for TREC-7 was concentrated on SDR and VLC tracks. On the ltering track, we tested a preliminary version of a text classiication toolkit that we have been developing over the last year. In the ad-hoc task, we introduce a new tf-factor in our term weighting scheme and use a simpliied retrieva...

متن کامل

CSIRO Routing and Ad-Hoc Experiments at TREC-6

CSIRO stands for Commonwealth Scientific and Industrial Research Organization. It is the Australian Government’s main research body. This is the first year CSIRO is taking part in TREC. We got involved in textual information retrieval research as a part of our activities in Resource Discovery Unit at the Research Data Network Co-operative Research Centre. The primary aim of our research in IR i...

متن کامل

INQUERY and TREC-8

This year the Center for Intelligent Information Retrieval (CIIR) at the University of Massachusetts participated in seven of the tracks: ad-hoc, ltering, spoken document retrieval, small web, large web, question and answer, and the query tracks. We spent signi cant time working on the ltering track, resulting in substantial performance improvement over TREC-7. For all of the other tracks, we u...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998